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BAYESIAN MODELING LONGITUDINAL DYADIC DATA WITH 
NONIGNORABLE DROPOUT, WITH APPLICATION TO 
A BREAST CANCER STUDY 

By Guangyu Zhang and Ying Yuan 1 

University of Maryland and 
University of Texas MD Anderson Cancer Center 

Dyadic data are common in the social and behavioral sciences, 
in which members of dyads are correlated due to the interdepen- 
dence structure within dyads. The analysis of longitudinal dyadic 
data becomes complex when nonignorable dropouts occur. We pro- 
pose a fully Bayesian selection-model-based approach to analyze lon- 
gitudinal dyadic data with nonignorable dropouts. We model re- 
peated measures on subjects by a transition model and account for 
within-dyad correlations by random effects. In the model, we allow 
subject's outcome to depend on his/her own characteristics and mea- 
sure history, as well as those of the other member in the dyad. We 
further account for the nonignorable missing data mechanism using 
a selection model in which the probability of dropout depends on the 
missing outcome. We propose a Gibbs sampler algorithm to fit the 
model. Simulation studies show that the proposed method effectively 
addresses the problem of nonignorable dropouts. We illustrate our 
methodology using a longitudinal breast cancer study. 

1. Introduction. Dyadic data are common in psychosocial and behav- 
ioral studies [Kenny, Kashy and Cook (2006)]. Many social phenomena, such 
as dating and marital relationships, are interpersonal by definition, and, as 
a result, related observations do not refer to a single person but rather 
to both persons involved in the dyadic relationship. Members of dyads of- 
ten influence each other's cognitions, emotions and behaviors, which leads 
to interdependence in a relationship. For example, a husband's (or wife's) 
drinking behavior may lead to lowered marital satisfaction for the wife (or 
husband). A consequence of interdependence is that observations of the two 
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individuals are correlated. For example, the marital satisfaction scores of 
husbands and wives tend to be positively correlated. One of the primary 
objectives of relationship research is to understand the interdependence of 
individuals within dyads and how the attributes and behaviors of one dyad 
member impact the outcome of the other dyad member. 

In many studies, dyadic outcomes are measured over time, resulting in 
longitudinal dyadic data. Repeatedly measuring dyads brings in two compli- 
cations. First, in addition to the within-dyad correlation, repeated measures 
on each subject are also correlated, that is, within-subject correlation. When 
analyzing longitudinal dyadic data, it is important to account for these two 
types of correlations simultaneously; otherwise, the analysis results may be 
invalid. The second complication is that longitudinal dyadic data are prone 
to the missing data problem caused by dropout, whereby subjects are lost 
to follow-up and their responses are not observed thereafter. In psychosocial 
dyadic studies, the dropouts are often nonignorable or informative in the 
sense that the dropout depends on missing values. In the presence of the 
nonignorable dropouts, conventional statistical methods may be invalid and 
lead to severely biased estimates [Little and Rubin (2002)]. 

There is extensive literature on statistical modeling of nonignorable drop- 
outs in longitudinal studies. Based on different factorizations of the likeli- 
hood of the outcome process and the dropout process, Little (1995) identified 
two broad classes of likelihood-based nonignorable models: selection mod- 
els [Wu and Carroll (1988); Diggle and Kenward (1994); Follman and Wu 
(1995); Glynn, Laird and Rubin (1986)] and pattern mixture models [Wu 
and Bailey (1989); Little (1993, 1994); Hogan and Laird (1997); Roy (2003); 
Hogan, Lin and Herman (2004)]. Other likelihood-based approaches that 
do not directly belong to this classification have also been proposed in the 
literature, for example, the mixed-effects hybrid model by Yuan and Little 
(2009) and a class of nonignorable models by Tsonaka et al. (2010). An- 
other general approach for dealing with nonignorable dropouts is based on 
estimation equations and includes Robins, Rotnitzky and Zhao (1995), Rot- 
nitzky, Robins and Scharfstein (1998), Scharfstein, Rotnitzky and Robins 

(1999) and Farewell (2010). Recent reviews of methods handling nonignor- 
able dropouts in longitudinal data can be found in Verbeke and Molenberghs 

(2000) , Molenberghs and Kenward (2007), Little (2009), Ibrahim and Molen- 
berghs (2009) and Daniels and Hogan (2008). In spite of the rich body of lit- 
erature noted above, to the best of our knowledge, the nonignorable dropout 
problem has not been addressed in the context of longitudinal dyadic data. 
The interdependence structure within dyads brings new challenges to this 
missing data problem. For example, within dyads, one member's outcome 
often depends on his/her covariates, as well as the other member's outcome 
and covariates. Thus, the dropout of the other member in the dyad causes 
not only a missing (outcome) data problem for that member, but also a miss- 
ing (covariate) data problem for the member who remains in the study. 
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We propose a fully Bayesian approach to deal with longitudinal dyadic 
data with nonignorable dropouts based on a selection model. Specifically, 
we model each subject's longitudinal measurement process using a transi- 
tion model, which includes both the patient's and spouse's characteristics 
as covariates in order to capture the interdependence between patients and 
their spouses. We account for the within-dyad correlation by introducing 
dyad-specific random effects into the transition model. To accommodate the 
nonignorable dropouts, we take the selection model approach by directly 
modeling the relationship between the dropout process and missing out- 
comes using a discrete time survival model. 

The remainder of the article is organized as follows. In Section 2 we 
describe our motivating data collected from a longitudinal dyadic breast 
cancer study. In Section 3 we propose a Bayesian selection-model-based ap- 
proach for longitudinal dyad data with informative nonresponse, and pro- 
vide estimation procedures using a Gibbs sampler in Section 4. In Section 5 
we present simulation studies to evaluate the performance of the proposed 
method. In Section 6 we illustrate our method by analyzing a breast cancer 
data set and we provide conclusions in Section 7. 

2. A motivating example. Our research is motivated by a single-arm 
dyadic study focusing on physiological and psychosocial aspects of pain 
among patients with breast cancer and their spouses [Badr et al. (2010)]. For 
individuals with breast cancer, spouses are most commonly reported as being 
the primary sources of support [Kilpatrick et al. (1998)], and spousal support 
is associated with lower emotional distress and depressive symptoms in these 
patients [Roberts et al. (1994)]. One specific aim of the study is to charac- 
terize the depression experience due to metastatic breast cancer from both 
patients' and spouses' perspectives, and examine the dyadic interaction and 
interdependence of patients and spouses over time regarding their depres- 
sion. The results will be used to guide the design of an efficient prevention 
program to decrease depression among patients. For example, conventional 
prevention programs typically apply interventions to patients directly. How- 
ever, if we find that the patient's depression depends on both her own and 
spouse's previous depression history and chronic pain, when designing a pre- 
vention program to improve the depression management and pain relief, we 
may achieve better outcomes by targeting both patients and spouses simul- 
taneously rather than targeting patients only. In this study, female patients 
who had initiated metastatic breast cancer treatment were approached by 
the project staff. Patients meeting the eligibility criteria (e.g., speak English, 
experience pain due to the breast cancer, having a male spouse or signifi- 
cant other, be able to carry on pre-disease performance, be able to provide 
informed consent) were asked to participate the study on a voluntary basis. 
The participation of the study would not affect their treatment in any way. 
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Depression in patients and spouse was measured at three time points 
(baseline, 3 months and 6 months) using the Center for Epidemiologic Stud- 
ies Depression Scale (CESD) questionnaires. However, a substantial number 
of dropouts occurred. Baseline CESD measurements were collected from 191 
couples; however, at 3 months, 101 couples (105 patients and 107 spouses) 
completed questionnaires, and at 6 months, 73 couples (76 patients and 
79 spouses) completed questionnaires. The missingness of the CESD mea- 
surements is likely related to the current depression levels of the patients 
or spouses, thus an nonignorable missing data mechanism is assumed for 
this study. Consequently, it is important to account for the nonignorable 
dropouts in this data analysis; otherwise, the results may be biased, as we 
will show in Section 6. 

3. Models. Consider a longitudinal dyadic study designed to collect J 
repeated measurements of a response Y and a vector of covariates X for each 
of n dyads. Let Ykij, X^- and Hkij = {yki,j-i, ■ ■ ■ > Ukii) T denote the outcome, 
p x 1 covariate vector and outcome history, respectively, for the member k of 
dyad % at the jth measurement time with k = 1, 2; i = 1, . . . , n; j = 1, . . . , J. 
We assume that X is fully observed (e.g., is external or fixed by study 
design), but Y is subject to missingness due to dropout. The random vari- 
able , taking values from 2 to J + 1 , indicates the time the member k of 
the ith dyad drops out, where = J + 1 if the subject completes the study, 
and Dki = j if the subject drops out between the {j — l)th and jth mea- 
surement time, that is, {i/kii, ■ ■ ■ ,Uki,j-i} are observed and {ykij, ■ ■ ■ ,Ukij} 
are missing. We assume at least 1 observation for each subject, as subjects 
without any observations have no information and are often excluded from 
the analysis. 

When modeling longitudinal dyadic data, we need to consider two types 
of correlations: the within-subject correlation due to repeated measures on 
a subject, and the within-dyad correlation due to the dyadic structure. We 
account for the first type of correlation by a transition model, and the second 
type of correlation by dyad-specific random effects 6j, as follows: 

Y Uj \h = bi + ai + H^./3 X + H^ 7l + X^A + x£ i7l + e U j, 
(3.1) Y 2ij \bi = bi + a 2 + H^/3 2 + Hf u72 + X^/3 2 + Xf ij72 + e 2ij , 
6i~iV(0,r 6 2 ). 

Regression parameters in this random-effects transition model have intu- 
itive interpretations similar to those of the actor-partner interdependence 
model, a conceptual framework proposed by Cook and Kenny (2005) to 
study dyadic relationships in the social sciences and behavior research fields. 
Specifically, (3i and /3 1 represent the "actor" effects of the patient, which 
indicate how the covariates and the outcome history of the patient (i.e., Xiy 
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and Hijj) affect her own current outcome, whereas ■Ji and 7 1 represent the 
"partner" effects for the patient, which indicate how the covariates and the 
outcome history of the spouse (i.e., X 2 jj arid H 2 jj) affect the outcome of the 
patient. Similarly, f3 2 and f3 2 characterize the actor effects and 7 2 and 7 2 
characterize the partner effects for the spouse of the patient. Estimates of 
the actor and partner effects provide important information about the inter- 
dependence within dyads. We assume that residuals euj and e 2 ij are inde- 
pendent and follow normal distributions N(0,af) and N(0,a 2 ), respectively; 
and euj and e 2 ij are independent of random effects b^s. The parameters a\ 
and a 2 are intercepts for the patients and spouses, respectively. 

In many situations, the conditional distribution of Y^j given H k ij and X^j 
depends only on the q prior outcomes ykij-i, ■ ■ ■ ,yki,j- q and X^jj. If this is 
the case, we obtain the so-called gth-order transition model, a type of tran- 
sition model that is most useful in practice [Diggle et al. (2002)]. The choice 
of the model order q depends on subject matters. In many applications, it is 
often reasonable to set q = 1 when the current outcome depends on only the 
last observed previous outcome, leading to commonly used Markov models. 
The likelihood ratio test can be used to assess whether a specific value of q 
is appropriate [Kalbfleisch and Lawless (1985)]. Auto-correlation analysis of 
the outcome history also can provide useful information to determine the 
value of q [Gottman (1981); Kendall and Ord (1990)]. 

Define Y ki = (Y kil , Y kidk .) and X ki = (X kil , ~X-kid ki ) for k = 1,2. 
Given {Xij , X 2 j } and the random effect bi , the joint log likelihood of ( , Y 2 i) 
for the ith dyad under the gth-order (random-effects) transition model is 
given by 

h ( Yij , Y 2 i | Xi j , X 2 « , bi) 

= ^2 £ij(yiij\X-iij,~X-2ij, Hijj, H 2 jj, bi) + £i(Ym, . . . , Yijq|Xij,X 2 j) 

du 

+ ^ (-ij(Y 2 ij\Xiij ,X 2 jj ,Hijj ,H 2 jj ,bi) +£i(Y 2 n, . . . ,y 2 j g |Xij,X 2 j), 

j=q+l 

where tij (Y ki j\X.uj, X 2 jj , Huj, H 2 jj, bi) is the likelihood corresponding to 
model (3.1), and £i (Y kil , . . . , Yu q |Xu, X 2 j ) is assumed free of r) k = (a k , (3 k , (3 k , 
7fc>7fc) 5 for k = l,2. 

An important feature of model (3.1) that distinguishes it from the stan- 
dard transition model is that the current value of the outcome Y depends 
on not only the subject's outcome history, but also the spouse's outcome 
history. Such a "partner" effect is of particular interest in dyadic studies be- 
cause it reflects the interdependence between the patients and spouses. This 
interdependence within dyads also makes the missing data problem more 
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challenging. Consider a dyad consisting of subjects A and B and that B 
drops out prematurely. Because the outcome history of B is used as a co- 
variate in the transition model of A, when B drops out, we face not only 
the missing outcome (for B) but also missing covariates (for A). We address 
this dual missing data problem using the data augmentation approach, as 
described in Section 4. 

To account for nonignorable dropouts, we employ the discrete time sur- 
vival model [Agresti (2002)] to jointly model the missing data mechanism. 
Specifically, we assume that the distribution of D ki depends on both the past 
history of the longitudinal process and the current outcome Y k ^ , but not on 
future observations. Define the discrete hazard rate A^.,- (H^,,- , Ykij, ~^kij) = 
Vx(D ki = j\D ki > j - l,H. kij ,Y kij ,X kij ). It follows that the probability of 
dropout for the member k in the ith dyad is given by 

P r (-Dfc« = d\H k ij, Y k ij, ~K k ij) 
' d-i 

^[{1 — Afcjj(Hfcjj, Y k ij, ^K. k ij)}X k i,i(H. k id, Y k id, Xfcjrf), 

i=2 
= < J 

^ { 1 — ^kij (Hfc ij , Y k ij Xfc ij ) } , 

We specify the discrete hazard rate \ k ij(H k ij ,Y ki j,X. ki j) using the logistic 
regression model: 

Logit(Aiij(Hiy, Y U j,Xuj)) =Ci + €i + X^Vi + H-Tijdi + <t>iY U j, 

(3.2) Logit(A 2 ij(H2i i ,y2 J j,X 2ij )) = a + £2 + X^V> 2 + + ^ 2 Y 2 ij, 

Ci ~7V(0,r c 2 ), 

where q is the random effect accounting for the within-dyadic correlation, 
and ^ k ,ip k ,d k and (p k ,k = 1,2, are unknown parameters. In this dropout 
model, we assume that, conditioning on the random effects, a subject's co- 
variates, past history and current (unobserved) outcome, the dropout prob- 
ability of this subject is independent of the characteristics and outcomes of 
the other member in the dyad. The spouse may indirectly affect the dropout 
rate of the patient through influencing the patient's depression status; how- 
ever, when conditional on the patient's depression score, the dropout of the 
patient does not depend on her spouse's depression score. 

In practice, we often expect that, given Y ki j and Y ki j_i, the conditional 
dependence of D ki on Y ki j_2, ■ ■ ■ > Yki,i will be negligible because, temporally, 
the patient's (current) decision of dropout is mostly driven by his (or her) 
current and the most recent outcome statuses. Using the breast cancer study 
as an example, we do not expect that the early history of depression plays 
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an important role for the patient's current decision of dropout; instead, the 
patient drops out typically because she is currently experiencing or most 
recently experienced high depression. The early history may influence the 
dropout but mainly through its effects on the current depression status. 
Once conditioning on the current and the most recent depression statuses, 
the influence from the early history is essentially negligible. Thus, we use 
a simpler form of the discrete hazard model 

Logit(Afcjj(H fc jj, Ykij, Xfejj)) = Q + £fc + Xj.jj-i/'fc + 8kYki,j-\ + 4>kYkij, 

k = l,2. 

4. Estimation. Under the Bayesian paradigm, we assign the following 
vague priors to the unknown parameters and fit the proposed model using 
a Gibbs sampler: 

akiPkiPkilkilkAk^kih and <t>k ~ constant, fc = l,2; 

a 2 k ~IG(a,b), fc = 1,2; 

Ti~IG(a,b); 

T*~IG{a,b); 

where IG(a, b) denote an inverse gamma distribution with a shape parame- 
ter a and a scale parameter b. We set a and b at smaller values, such as 0.1, 
so that the data dominate the prior information. Let y b s and y m i s denote 
the observed and missing part of the data, respectively. Considering the kih 
iteration of the Gibbs sampler, the first step of the iteration is "data aug- 
mentation" [Tanner and Wong (1987)], in which the missing data y m j s are 
generated from their full conditional distributions. Without loss of general- 
ity, suppose for the ith dyad, member 2 drops out of the study no later than 
member 1, that is, du > d^i, and let dj = max(dij, c^j). Assuming a first-order 
((7=1) transition model (or Markov model) and letting denote a generic 
symbol that represents the values of all other model parameters, the data 
augmentation consists of the following 3 steps: 

(1) For j = G?2i, • ■ • ,di — 1, draw %)2ij from the conditional distribution 



2/2ij|yobs,0ociV 



° 2 Vi + /32^ 2 2 M2 + 7i^i Vs 1 



where 



X ^2id 2i ( H 2id 2i , U2id 2l , X 2 jd 2i ) I U- d2i \ 

fi* 1 = bi + $%yu,j~\ + l2Vxi,j-\ + a 2 + X^/3 2 + Xf^-72, 

(4 = V2i,j+1 -k~ l2V\ij -OL2- X2ij +1 /3 2 - X 5j + i7 2 , 
^3 = yiij+l ~k~ PlUlij - Oil- Ttfij+lPl - X^j+lTl- 
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(2) Draw 2/2j,dj from the conditional distribution 

V2i,di lyobs, ~ NQh + y 2 i,d i -i^ + yu,di-il2 + a 2 + Xj id ./3 2 + Xf idi 7 2 , cr|). 

(3) Draw yu^ from the conditional distribution 

yu,di |y bs, oc iV(6j + yxi,ek-iPi + 2/24,^-171 + "i + x L,^i + x Li7i 5 

Now, with the augmented complete data y = {y bsj Ymis}) the parameters 
are drawn alternatively as follows: 

(4) For i = l,. .. ,n, draw random effects hi from the conditional distri- 
bution 



(ck - \)o\tI + (di - \)o\tI + u\u\ 



h\y,e = N 



(di - \)o\tI + (di - \)a\rl + a\a\, 
where 

mi = Vuj-iPi + 2/2ij-i7i + «i + x iij/3i + X 2*j7i, 

mi = y2i,i-ifc + 2/iij-i72 + a 2 + x ^/3 2 + X Hj72- 
(5) Draw cr^ from the conditional distribution 



4\y,0 = IG(a + 



2 7 2 

where 

«iij = &i + yu,j-iPi + 2/2i,j-i7i + ai + y^iftx + x 2 r ij7i, 

u 2ij = h + y 2i ,j-iP 2 + yii,i-il2 + a 2 + x 2 r jj/3 2 + x iij72- 
(6) Draw from the conditional distribution 



Tt\y,G = IG\a + -,b + 

(7) Draw rj l = (ai,/3i,7i,/3 1 ,7 1 ) from the normal distribution 
Vl \y, e = N((Zj Z!)- X zf ( yi - h), (Zf Zi)-V?), 
where yi = (2/11,2, • • • ,2/iMi, • • ■ >2/ii,2, • • ■ ,2/li,di> • • • ,2/in,2, ■ ■ ■ ,yin,d n ) T and 

V 



r 1 • 


1 


1 


1 


2/11,1 • 


• 2/ll,di-l • 


• 2/li,l • 


• 2/li,dj-l 


2/21,1 • 


• 2/21,^-1 • 


• 2/2i,l • 


• 2/2i,dj-l 


X ll,2 ' 


X ll,di 


• X li,2 • 


X li,d, 


\ X 21,2 ' 


X 21,di 


• X 2i,2 • 


X 2i,d, 
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(8) Similarly, draw rj 2 = (a 2 ,/3 2 ,7 2 ,/3 2 ,7 2 ) from the conditional distri- 
bution 

r? 2 |y, e = ^((z 2 r z 2 )- 1 z 2 r ( y2 - h), (z 2 Tz 2 )-V 2 2 ), 

where Z 2 and y 2 are defined in a similar way to Zi and yi. 

(9) Draw vj\ = (£ t i,ip 1 ,5i,(pi) and zu 2 = (£ 2 , tp 2 , &2, ^> 2 ) from the condi- 
tional distributions 

n d\i — l 

-K7i|y,0ocJ [ (1 - \iij)\iid u , 

i=l j=2 
n d2i — l 

U7 2 |y,0ocj [ (l-A 2 y)A 2id2i . 

i=l j=2 

(10) Draw random effects c% from the conditional distribution 

dii-l d 2 i-l 

Q|y,6>«Af(0,r c 2 ) JJ (1 - X U j)X lidli J] (1 - A 2ii )A 2i(i2i . 
i=2 i=2 

(11) Draw t 2 from the conditional distribution 

r!\y,e = IG[a + ^b+^f^y 

5. Simulation studies. We conducted two simulation studies (A and B). 
Simulation A consists of 500 data sets, each with 200 dyads and three re- 
peated measures. For the ith dyad, we generated the first measurements, Ym 
and Y2H1 from normal distributions N(5, 1) and N(7, 1), respectively, and 
generated the second and third measurements based on the following random- 
effects transition model: 

Y Uj \bi ~ N(bi + ptfuj-i + 71^-1 + + 71^2, 1), j = 2, 3, 

Y 2ij \bi ~ A^(6 4 + /3 2 y 2ij _! + 72*1^-1 + £2*2 + 72*1, 1), j = 2, 3, 

&i~JV(0,l), 

where /3i = 71 = 0.5, /?2 = 72 = 0.6, /3i = 71 = /?2 = 72 = 1, and covariates X\ 
and X 2 were generated independently from iV(0, 1). We assumed that the 
baseline (first) measurements Ym and Y 2 n were observed for all subjects, 
and the hazard of dropout at the second and third measurement times de- 
pended on the current and last observed values of Y, that is, 

logit(Aiij|q) = Ci- Y U j - O.bYuj^ - 6, j = 2,3, 

logit(A 2 jj|Q) = a - Y 2ij - 0.5Y 2i j- 1 - 6, j = 2,3, 

cj~ JV(0,1). 
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Table 1 

Bias, standard error (SE) and coverage rate of 95% credible intervals under different 

methods for simulation A 



Complete-case analysis Available-case analysis Proposed method 



Parameter 


Bias 


SE 


Coverage 


Bias 


SE 


Coverage 


Bias 


SE 


Coverage 


Pi 


-0.03 


0.06 


0.93 


-0.01 


0.05 


0.94 


-0.01 


0.05 


0.95 


7i 


-0.06 


0.05 


0.81 


-0.03 


0.04 


0.88 


0.07 


0.04 


0.96 


h 


-0.16 


0.12 


0.72 


-0.10 


0.10 


0.81 


0.05 


0.08 


0.94 


7i 


-0.17 


0.12 


0.75 


-0.10 


0.10 


0.78 


0.02 


0.09 


0.97 


ft 


-0.06 


0.06 


0.89 


-0.06 


0.05 


0.84 


0.08 


0.05 


0.97 


72 


-0.04 


0.05 


0.87 


-0.00 


0.04 


0.95 


-0.04 


0.06 


0.96 


h 


-0.17 


0.12 


0.73 


-0.10 


0.10 


0.84 


-0.01 


0.12 


0.95 


72 


-0.17 


0.12 


0.72 


-0.10 


0.10 


0.81 


0.01 


0.09 


0.97 



Under this dropout model, on average, 24% (12% of member 1 and 13% 
of member 2) of the dyads dropped out at the second time point and 45% 
(26% of member 1 and 30% of member 2) dropped out at the third mea- 
surement time. We applied the proposed method to the simulated data sets. 
We used 1,000 iterations to burn in and made inference based on 10,000 
posterior draws. For comparison purposes, we also conducted complete-case 
and available-case analyses. The complete-case analysis was based on the 
data from dyads who completed the follow-up, and the available- case anal- 
ysis was based on all observed data (without considering the missing data 
mechanism) . 

Table 1 shows the bias, standard error (SE) and coverage rate of the 95% 
credible interval (CI) under different approaches. We can see that the pro- 
posed method substantially outperformed the complete-case and available- 
case analyses. Our method yielded estimates with smaller bias and cover- 
age rates close to the 95% nominal level. In contrast, the complete-case 
and available-case analyses often led to larger bias and poor coverage rates. 
For example, the bias of the estimate of /?i under the complete-case and 
available-case analyses were —0.16 and —0.10, respectively, substantially 
larger than that under the proposed method (i.e., 0.05); the coverage rate us- 
ing the proposed method was about 94%, whereas those using the complete- 
case and available-case analyses were under 82%. 

The second simulation study (Simulation B) was designed to evaluate 
the performance of the proposed method when the nonignorable missing 
data mechanism is misspecified, for example, data actually are missing at 
random (MAR). We generated the first measurements, Ym and Y2H, from 
normal distribution N(3, 1) independently, and generated the second and 
third measurements based on the same transition model as in Simulation A. 
We assumed the hazard of dropout at the second and third measurement 
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times depended on the previous (observed) value of Y quadratically, but not 
on the current (missing) value of Y, that is, 

logit(Aiij \ci) = Ci + iiij-i - 15, j = 2, 3, 

(5.1) logit (A 2i j | q) = Ci + Y 2 \ j _ 1 - 15, 3 = 2,3, 

Ci~JV(0,l). 

Under this MAR dropout model, on average, 37% (21% of member 1 and 
21% of member 2) of the dyads dropped out at the second time point and 
27% (24% of member 1 and 33% of member 2) dropped out at the third 
measurement time. To fit the simulated data, we considered two nonignor- 
able models with different specifications of the dropout (or selection) model. 
The first nonignorable model assumed a flexible dropout model 

logit(Ajfcij|&j) = Ci + Ck + 5 k Yki,j-\ + 4>kY k i,j, 

which included the true dropout process (5.1) as a specific case with 0^ = 0; 
and the second nonignorable model took a misspecified dropout model of 
the form 

logit(Ajfcy|&i) = Ci + £fc + 8k Y H,j-l + 4>kY k i,j. 

Table 2 shows the bias, standard error and coverage rate of the 95% CI 
under different approaches. When the missing data were MAR, the complete- 
case analysis was invalid and led to biased estimates and poor coverage rates 
because the complete cases are not random samples from the original pop- 
ulation. In contrast, the available-case analysis yielded unbiased estimates 
and coverage rates close to the 95% nominal level. For the nonignorable mod- 
els, the one with the flexible dropout model yielded unbiased estimates and 
reasonable coverage rates, whereas the model with the misspecified dropout 
model led to biased estimates (e.g., j3\ and $2) and poor coverage rates. This 
result is not surprising because it is well known that selection models are sen- 
sitive to the misspecification of the dropout model [Little and Rubin (2002); 
Daniels and Hogan (2000)]. For nonignorable missing data, the difficulty is 
that we cannot judge whether a specific dropout model is misspecified or 
not based solely on observed data because the observed data contain no 
information about the (nonignorable) missing data mechanism. To address 
this difficulty, one possible approach is to specify a flexible dropout model to 
decrease the chance of model misspecification. Alternatively, maybe a better 
approach is to conduct sensitivity analysis to evaluate how the results vary 
when the dropout model varies. We will illustrate the latter approach in the 
next section. 

6. Application. We applied our method to the longitudinal metastatic 
breast cancer data. We used the first-order random-effects transition model 
for the longitudinal measurement process. In the model, we included 5 co- 
variates: chronic pain measured by the Multidimensional Pain Inventory 



to 



Table 2 

Bias, standard error (SE) and coverage rate of 95% credible intervals under different methods for simulation B 

Nonignorable model Nonignorable model P 

Complete-case analysis Available-case analysis (flexible dropout model) (misspecifed dropout model) N 

hH 

Parameter Bias SE Coverage Bias SE Coverage Bias SE Coverage Bias SE Coverage ^ 

~jh -0.06 O08 086 (LOO O06 095 -0.01 O06 095 014 O06 078 O 

7i -0.09 0.08 0.82 0.00 0.05 0.96 0.07 0.05 0.97 -0.01 0.05 0.95 ^ 

/3i -0.11 0.14 0.84 0.00 0.10 0.95 0.04 0.08 0.96 0.03 0.08 0.94 D 

71 -0.13 0.14 0.84 0.00 0.10 0.96 0.02 0.09 0.97 0.02 0.09 0.98 r< 

f3 2 - 0.07 0.08 0.87 0.00 0.06 0.96 0.02 0.06 0.97 0.12 0.06 0.79 ^ 

72 -0.10 0.08 0.78 0.00 0.07 0.96 0.00 0.06 0.96 -0.08 0.06 0.93 > 
/3 2 -0.14 0.14 0.82 0.00 0.10 0.96 0.01 0.12 0.94 0.01 0.12 0.95 

72 -0.14 0.13 0.83 0.01 0.10 0.96 0.01 0.09 0.97 0.01 0.09 0.98 
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Table 3 

Parameter estimates and 95% credible intervals (shown in parentheses) for the patients' 
and spouses' measurement models based on the complete-case, available-case analyses 
and the proposed approach for the breast cancer data 



Complete-case analysis Available-cases analysis Proposed method 



Intercept 


2.53 


(-1.71,6.77) 


0.99 


(-2.55,4.52) 


5.10 


(3.31,6.59) 


Patient CESD 


0.43 


(0.29,0.58) 


0.56 


(0.44, 0.68) 


0.87 


(0.80,0.93) 


Spouse CESD 


0.07 


(-0.06,0.20) 


0.06 


(-0.06,0.17) 


0.14 


(0.09,0.19) 


Patient MPI 


0.94 


(0.22,1.67) 


0.82 


(0.21,1.43) 


1.24 


(0.83,1.64) 


Spouse MPI 


1.06 


(0.29, 1.82) 


0.90 


(0.31,1.48) 


0.62 


(0.40,0.84) 


Cancer stage 


0.39 


(-0.81,1.60) 


0.59 


(-0.43,1.60) 


0.10 


(-0.47,0.66) 


Intercept 


3.68 


(-0.55,7.92) 


2.00 


(-1.63,5.64) 


8.16 


(4.26,11.9) 


Patient CESD 


-0.05 


(-0.19,0.09) 


0.01 


(-0.11,0.13) 


0.68 


(0.63,0.74) 


Spouse CESD 


0.77 


(0.64,0.90) 


0.78 


(0.66,0.89) 


0.76 


(0.71,0.81) 


Patient MPI 


0.43 


(-0.29,1.15) 


0.27 


(-0.27,0.81) 


0.53 


(0.33,0.73) 


Spouse MPI 


0.55 


(-0.22,1.31) 


0.58 


(-0.04,1.20) 


0.36 


(-0.64,1.15) 


Cancer stage 


-0.42 


(-1.63,0.79) 


-0.21 


(-1.23,0.80) 


-0.50 


(-0.92,0.09) 



(MPI) and previous CESD scores from both the patients and spouses, and 
the patient's stage of cancer. In the discrete-time dropout model, we in- 
cluded the subject's current and previous CESD scores, MPI measurements 
and the patient's stage of cancer as covariates. Age was excluded from the 
models because its estimate was very close to and not significant. We used 
5,000 iterations to burn in and made inference based on 5,000 posterior 
draws. We also conducted the complete-case and available-case analyses for 
the purpose of comparison. 

As shown in Table 3, the proposed method suggests significant "part- 
ner" effects for the patients. Specifically, the patient's depression increases 
with her spouse's MPI [estimate = 0.62 and 95% CI = (0.40, 0.84)] and pre- 
vious CESD [estimate = 0.14 and 95% CI = (0.09, 0.19)]. In addition, there 
are also significant "actor" effects for the patients, that is, the patient's 
depression is positively correlated with her own MPI and previous CESD 
scores. For the spouses, we observed similar significant "partner" effects: the 
spouse's depression increases with the patient's MPI and previous CESD 
scores. However, the "actor" effects for the spouses are different from those 
for the patients. The spouse's depression correlates with his previous CESD 
scores but not the MPI level, whereas the patient's depression is related 
to both variables. Based on these results, we can see that the patients and 
spouses are highly interdependent and influence each other's depression sta- 
tus. Therefore, when designing a prevention program to reduce depression 
in patients, we may achieve better outcomes by targeting both patients and 
spouses simultaneously. 
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Table 4 

Parameter estimates and 95% credible intervals (shown in parentheses) of the dropout 

model for the breast cancer data 





Intercept 


Current CESD 


Previous CESD 


MPI 


Cancer stage 


Patients 
Spouses 


-0.8 (-8.3,6.2) 
-15.6 (-25.6, -4.1) 


-1.6 (-4.2,-0.3) 
0.8 (-0.2,1.6) 


0.6 (-0.3,1.6) 
-0.7 (-1.6,0.5) 


0.8 (-1.6,3.8) 
-0.2 (-2.1,1.4) 


-0.4 (-0.9,2.4) 
2.9 (-1.7,6.4) 



As for the dropout process, the results in Table 4 suggest that the missing 
data for the patients are nonignorable because the probability of dropout 
is significantly associated with the patient's current (missing) CESD score. 
In contrast, the missing data for the spouse appears to be ignorable, as 
the probability of dropout does not depend on the spouse's current (miss- 
ing) CESD score. For the variance components, the estimates of residuals 
variances for patients and spouses are a\ = 5.02 [95% CI = (2.98, 7.01)] and 
ct\ = 6.12 [95% CI = (4.03,7.95)], respectively. The estimates of the vari- 
ances for the random effects b{ and q are = 9.95 [95% CI = (7.96, 11.92)] 
and = 7.97 [95% CI = (5.99,9.89)], respectively, suggesting substantial 
variations across dyads. 

Compared to the proposed approach, both the complete-case and available- 
case analyses fail to detect some "partner" effects. For example, for spouses, 
the complete-case and available-case analyses assert that the spouse's CESD 
is correlated with his own previous CESD scores only, whereas the proposed 
method suggested that the spouse's CESD is related not only to his own 
CESD but also to the patient's CESD and MPI level. In addition, for pa- 
tients, the "partner" effect of the spouse's CESD is not significant under 
the complete-case and available-case analyses, but is significant under the 
proposed approach. These results suggest that ignoring the nonignorable 
dropouts could lead to a failure to detect important covariate effects. 

Nonidentifiability is a common problem when modeling nonignorable miss- 
ing data. In our approach, the observed data contain very limited informa- 
tion on the parameters that link the missing outcome with the dropout 
process, that is, <j)\ and 4>2 in the dropout model. The identification of these 
parameters is heavily driven by the untestable model assumptions [Verbeke 
and Molenberghs (2000); Little and Rubin (2002)]. In this case, a sensible 
strategy is to perform a sensitivity analysis to examine how the inference 
changes with respect to the values of <j>\ and 4>2 [Daniels and Hogan (2000, 
2008); Rotnitzky et al. (2001)]. We conducted a Bayesian sensitivity analy- 
sis by assuming informative normal prior distributions for <p± and 4>2 with 
a small variance of 0.01 and the mean fixed, successively, at various values. 
Figures 1 and 2 show the parameter estimates of the measurement models 
when the prior means of <p\ and (f>2 vary from —3 to 3. In general, the es- 
timates were quite stable, except that the estimate of cancer stage in the 
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Measurement Model of Patient 



• CESD patient A CESD spouse □ MPI patient o MPI spouse v stage 




n i i i i i r 

-3-2-10 1 2 3 

Prior Mean 



Fig. 1. Sensitivity analysis of the proposed nonignorable model for the breast cancer 
data. The figure shows the parameter estimates of the patients ' measurement model under 
informative normal priors for <f>i and (j>2 with a mean varying from —3 to 3 and a fixed 
variance of 0.01. 

measurement model of patient (Figure 1) and the estimate of spouse's MPI 
in the measurement model of spouse (Figure 2) demonstrated some varia- 
tions. 

We conducted another sensitivity analysis on the prior distributions of a\ , 
erf, and t%. We considered various inverse gamma priors, IG(a,b), by 
setting a = b = 0.01,1 and 5. As shown in Table 5, the estimates of the 
measurement model parameters were stable under different prior distribu- 
tions, suggesting the proposed method is not sensitive to the priors of these 
parameters. 

7. Conclusion. We have developed a selection-model-based approach to 
analyze longitudinal dyadic data with nonignorable dropouts. We model the 
longitudinal outcome process using a transition model and account for the 
correlation within dyads using random effects. In the model, we allow a sub- 
ject's outcome to depend on not only his/her own characteristics but also the 
characteristics of the other member in the dyad. As a result, the parameters 
of the proposed model have appealing interpretations as "actor" and "part- 
ner" effects, which greatly facilitates the understanding of interdependence 
within a relationship and the design of more efficient prevention programs. 
To account for the nonignorable dropout, we adopt a discrete time survival 
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Measurement Model of Spouse 



• CESD patient a CESD spouse □ MPI patient o MPI spouse v stage 




1 

Prior Mean 



Fig. 2. Sensitivity analysis of the proposed nonignorable model for the breast cancer 
data. The figure shows the parameter estimates of the spouses ' measurement model under 
informative normal priors for <f>i and (j>2 with a mean varying from —3 to 3 and a fixed 
variance of 0.01. 



model to link the dropout process with the longitudinal measurement pro- 
cess. We used the data augment method to address the complex missing data 
problem caused by dropout and interdependence within dyads. The simula- 
tion study shows that the proposed method yields consistent estimates with 
correct coverage rates. We apply our methodology to the longitudinal dyadic 
data collected from a breast cancer study. Our method identifies more "part- 
ner" effects than the methods that ignore the missing data, thereby provid- 
ing extra insights into the interdependence of the dyads. For example, the 
methods that ignore the missing data suggest that the spouse's CESD re- 
lated only to his own previous CESD scores, whereas the proposed method 
suggested that the spouse's CESD related not only to his own CESD but 
also to the patient's CESD and MPI level. This extra information can be 
useful for the design of more efficient depression prevention programs for 
breast cancer patients. 

In the proposed dropout model (3.2), we assume that time-dependent co- 
variates X^- and Y^j, k = 1,2, have captured all important time-dependent 
factors that influence dropout. However, this assumption may not be always 
true. A more flexible approach is to include in the model a time-dependent 
random effect Cy to represent all unmeasured time-variant factors that influ- 
ence dropout. We can further put a hierarchical structure on Cj,- to shrink it 
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Table 5 

Parameter estimates and 95% credible intervals (show in parentheses) for the patient's 
and spouse's measurement models by fixing a and b at 0.01, 1 and 5 for the inverse 
gamma prior IG(a,b) on o~\, o~2, r b an d r c 







a = 


= b = 0.01 


a 


= 6 = 1 


a 


= b = 5 


Patients 


Intercept 


4.72 


(3.32,6.11) 


5.00 


(3.48,6.47) 


5.02 


(3.57,6.48) 




Patient CESD 


0.87 


(0.81,0.93) 


0.86 


(0.80,0.92) 


0.88 


(0.83,0.94) 




Spouse CESD 


0.14 


(0.09,0.19) 


0.14 


(0.08,0.19) 


0.13 


(0.08,0.18) 




Patient MPI 


1.27 


(0.84,1.71) 


1.12 


(0.67,1.60) 


1.20 


(0.85,1.57) 




Spouse MPI 


0.71 


(0.49,0.91) 


0.68 


(0.46,0.87) 


0.61 


(0.39,0.82) 




Cancer stage 


-0.03 


(-0.50,0.50) 


0.18 


(-0.31,0.65) 


-0.08 


(-0.57,0.40) 


Spouses 


Intercept 


6.40 


(4.39,8.41) 


7.56 


(5.35,9.93) 


7.52 


(5.43,9.55) 




Patient CESD 


0.67 


(0.62,0.73) 


0.67 


(0.62,0.72) 


0.69 


(0.64,0.73) 




Spouse CESD 


0.76 


(0.71,0.80) 


0.75 


(0.71,0.81) 


0.75 


(0.71,0.80) 




Patient MPI 


0.51 


(0.32,0.71) 


0.54 


(0.35,0.73) 


0.53 


(0.34,0.72) 




Spouse MPI 


0.79 


(-0.05,1.46) 


0.54 


(-0.03,1.06) 


0.45 


(-0.23,1.09) 




Cancer stage 


-0.41 


(-0.86,0.02) 


-0.38 


(-0.81,0.03) 


-0.48 


(-0.87,0.08) 



toward a dyad-level time-invariant random effect Cj to account for the effects 
of unmeasured time-invariance factors on dropout. In addition, in (3.2), in 
order to allow members in a dyad to drop out at different times, we spec- 
ify separate dropout models for each dyadic member, linked by a common 
random effect. Although the common random effect makes the members in 
a dyad more likely to drop out at the same time, it may not be the most 
effective modeling approach when dropout mostly occurs at the dyad level. 
In this case, a more effective approach is that, in addition to the dyad- level 
random effect, we further put hierarchical structure on the coefficients of 
common covariates (in the two dropout models) to shrink toward a common 
value to reflect that dropout is almost always at the dyad level. 
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